{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "fc51e166",
   "metadata": {
    "colab_type": "text",
    "id": "view-in-github"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/tomasonjo/blogs/blob/master/stock_diversity/Stock%20diversification%20analysis.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "300Nlo_zZsqu",
   "metadata": {
    "id": "300Nlo_zZsqu"
   },
   "source": [
    "* Updated to GDS 2.0 version\n",
    "* Link to original blog post: https://medium.com/neo4j/diversify-your-stock-portfolio-with-graph-analytics-4520a5e46b3d"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "53697jijZ0GC",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "53697jijZ0GC",
    "outputId": "3b99569b-75d0-471a-da73-f1e8f9565e92"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Collecting neo4j\n",
      "  Downloading neo4j-4.4.2.tar.gz (89 kB)\n",
      "\u001b[?25l\r",
      "\u001b[K     |███▋                            | 10 kB 19.3 MB/s eta 0:00:01\r",
      "\u001b[K     |███████▎                        | 20 kB 21.9 MB/s eta 0:00:01\r",
      "\u001b[K     |███████████                     | 30 kB 24.1 MB/s eta 0:00:01\r",
      "\u001b[K     |██████████████▋                 | 40 kB 15.7 MB/s eta 0:00:01\r",
      "\u001b[K     |██████████████████▎             | 51 kB 13.6 MB/s eta 0:00:01\r",
      "\u001b[K     |██████████████████████          | 61 kB 15.7 MB/s eta 0:00:01\r",
      "\u001b[K     |█████████████████████████▋      | 71 kB 15.5 MB/s eta 0:00:01\r",
      "\u001b[K     |█████████████████████████████▎  | 81 kB 16.9 MB/s eta 0:00:01\r",
      "\u001b[K     |████████████████████████████████| 89 kB 4.6 MB/s \n",
      "\u001b[?25hRequirement already satisfied: pytz in /usr/local/lib/python3.7/dist-packages (from neo4j) (2018.9)\n",
      "Building wheels for collected packages: neo4j\n",
      "  Building wheel for neo4j (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
      "  Created wheel for neo4j: filename=neo4j-4.4.2-py3-none-any.whl size=115365 sha256=66fe589999cb438b038e007f7fc8e66d9f7fe722a9f04785acf90c0ec9da39a9\n",
      "  Stored in directory: /root/.cache/pip/wheels/10/d6/28/95029d7f69690dbc3b93e4933197357987de34fbd44b50a0e4\n",
      "Successfully built neo4j\n",
      "Installing collected packages: neo4j\n",
      "Successfully installed neo4j-4.4.2\n"
     ]
    }
   ],
   "source": [
    "!pip install neo4j"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9f8e6193",
   "metadata": {
    "id": "9f8e6193"
   },
   "source": [
    "# Diversify your stock porftolio with graph anayltics\n",
    "A couple of weeks ago, I stumbled upon the stock market volume analysis in Neo4j by Bryant Avey. It got me interested in how we could use graph analytics to analyze stock markets. After a bit of research, I found Spread of risk across financial markets research paper. The authors infer a network between stocks by examining the correlation between stocks and then use community detection algorithms to help with diversifying stock portfolios. As a conclusion of the research paper, the authors argue that this technique could reduce risk by diversifying your investment and, interestingly, increasing your profits.\n",
    "\n",
    "_Disclaimer: This is not financial advice, and you should do your own research before investing_\n",
    "\n",
    "We will be using a subset of Kaggle's NASDAQ-100 Stock Price dataset. The dataset contains price and volume information of 102 securities for the last decade. For this post, I have prepared a subset CSV file that contains the stock price and volume information between May and September 2021.\n",
    "Each stock ticker will be represented as a separate node. We will store the price and volume information for each stock ticker as a linked list of stock trading days nodes. Using the linked list schema is a general graph model I use when modeling timeseries data in Neo4j. If you want to follow along with examples in this blog post, I suggest you open a blank project in Neo4j Sandbox. Neo4j Sandbox provides free cloud instances of Neo4j database that come pre-installed with both the  APOC and Graph Data Science plugins. You can copy the following Cypher query in Neo4j Browser to import the stock information."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "4c3ed092",
   "metadata": {
    "id": "4c3ed092"
   },
   "outputs": [],
   "source": [
    "# Define Neo4j connections\n",
    "import pandas as pd\n",
    "from neo4j import GraphDatabase\n",
    "host = 'bolt://3.231.25.240:7687'\n",
    "user = 'neo4j'\n",
    "password = 'hatchets-visitor-axes'\n",
    "driver = GraphDatabase.driver(host,auth=(user, password))\n",
    "\n",
    "def run_query(query):\n",
    "    with driver.session() as session:\n",
    "        result = session.run(query)\n",
    "        return pd.DataFrame([r.values() for r in result], columns=result.keys())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "fadf4253",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 49
    },
    "id": "fadf4253",
    "outputId": "ebea48b5-803e-4ca5-c0f6-6e9b253c3a4b"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Empty DataFrame\n",
       "Columns: []\n",
       "Index: []"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "run_query(\"\"\"\n",
    "LOAD CSV WITH HEADERS FROM \"https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/stocks/stock_prices.csv\" as row\n",
    "CALL {\n",
    "    WITH row\n",
    "    MERGE (s:Stock{name:row.Name})\n",
    "    CREATE (s)-[:TRADING_DAY]->(:StockTradingDay{date: date(row.Date), close:toFloat(row.Close), volume: toFloat(row.Volume)})\n",
    "} IN TRANSACTIONS\n",
    "\"\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6d07fc53",
   "metadata": {
    "id": "6d07fc53"
   },
   "source": [
    "Next, we need to create a linked list between stock trading days nodes. We can easily create a linked list with the `apoc.nodes.link` procedure. We will also collect the closing prices by days of stocks and store them as a list property of the stock node."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "847602fd",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 81
    },
    "id": "847602fd",
    "outputId": "5b78bff3-d69b-4978-9885-c83f0ec44c18"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>result</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>done</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  result\n",
       "0   done"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "run_query(\"\"\"\n",
    "MATCH (s:Stock)-[:TRADING_DAY]->(day)\n",
    "WITH s, day\n",
    "ORDER BY day.date ASC\n",
    "WITH s, collect(day) as nodes, collect(day.close) as closes\n",
    "SET s.close_array = closes\n",
    "WITH nodes\n",
    "CALL apoc.nodes.link(nodes, 'NEXT_DAY')\n",
    "RETURN distinct 'done' AS result\n",
    "\"\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11ffdb96",
   "metadata": {
    "id": "11ffdb96"
   },
   "source": [
    "# Inferring relationships based on the correlation coefficient\n",
    "We will use the Pearson similarity as the correlation metric. The authors of the above-mentioned research paper use more sophisticated correlation metrics, but that is beyond the scope of this blog post. The input to the Pearson similarity algorithm will be the ordered list of closing prices we produced in the previous step. The algorithm will calculate the correlation coefficient and store the results as relationships between most correlating stocks. I have used the topKparameter value of 3, so each stock will be connected to the three most correlating stock tickers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "jX9-Q53Jbfpl",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 81
    },
    "id": "jX9-Q53Jbfpl",
    "outputId": "526df57b-413b-44b5-9ac0-5650565b43c7"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>nodeProjection</th>\n",
       "      <th>relationshipProjection</th>\n",
       "      <th>graphName</th>\n",
       "      <th>nodeCount</th>\n",
       "      <th>relationshipCount</th>\n",
       "      <th>projectMillis</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>{'Stock': {'label': 'Stock', 'properties': {'c...</td>\n",
       "      <td>{'__ALL__': {'orientation': 'NATURAL', 'indexI...</td>\n",
       "      <td>stock</td>\n",
       "      <td>102</td>\n",
       "      <td>0</td>\n",
       "      <td>112</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                      nodeProjection  \\\n",
       "0  {'Stock': {'label': 'Stock', 'properties': {'c...   \n",
       "\n",
       "                              relationshipProjection graphName  nodeCount  \\\n",
       "0  {'__ALL__': {'orientation': 'NATURAL', 'indexI...     stock        102   \n",
       "\n",
       "   relationshipCount  projectMillis  \n",
       "0                  0            112  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "run_query(\"\"\"\n",
    "CALL gds.graph.project('stock', 'Stock', '*', {nodeProperties:['close_array']})\n",
    "\"\"\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "Heg_9139bm87",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "Heg_9139bm87",
    "outputId": "81555d81-20a7-48ee-90d8-7fe8ebf2125d"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ranIterations</th>\n",
       "      <th>nodePairsConsidered</th>\n",
       "      <th>didConverge</th>\n",
       "      <th>preProcessingMillis</th>\n",
       "      <th>computeMillis</th>\n",
       "      <th>mutateMillis</th>\n",
       "      <th>postProcessingMillis</th>\n",
       "      <th>nodesCompared</th>\n",
       "      <th>relationshipsWritten</th>\n",
       "      <th>similarityDistribution</th>\n",
       "      <th>configuration</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>6</td>\n",
       "      <td>10221</td>\n",
       "      <td>True</td>\n",
       "      <td>0</td>\n",
       "      <td>70</td>\n",
       "      <td>3</td>\n",
       "      <td>-1</td>\n",
       "      <td>102</td>\n",
       "      <td>306</td>\n",
       "      <td>{'p1': 0.997222900390625, 'max': 0.99995803833...</td>\n",
       "      <td>{'topK': 3, 'maxIterations': 100, 'randomJoins...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   ranIterations  nodePairsConsidered  didConverge  preProcessingMillis  \\\n",
       "0              6                10221         True                    0   \n",
       "\n",
       "   computeMillis  mutateMillis  postProcessingMillis  nodesCompared  \\\n",
       "0             70             3                    -1            102   \n",
       "\n",
       "   relationshipsWritten                             similarityDistribution  \\\n",
       "0                   306  {'p1': 0.997222900390625, 'max': 0.99995803833...   \n",
       "\n",
       "                                       configuration  \n",
       "0  {'topK': 3, 'maxIterations': 100, 'randomJoins...  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "run_query(\"\"\"\n",
    "CALL gds.knn.mutate('stock', \n",
    "  {nodeProperties:'close_array', topK:3, similarityCutoff: 0.2, mutateProperty:'score', mutateRelationshipType:'SIMILAR'})\n",
    "\"\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "13a33d0c",
   "metadata": {
    "id": "13a33d0c"
   },
   "source": [
    "As mentioned, the algorithm produced new SIMILAR relationships between stock ticker nodes.\n",
    "We can now run a community detection algorithm to identify various clusters of correlating stocks. I have decided to use the Louvain Modularity in this example. The community ids will be stored as node properties."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "f59216e2",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 142
    },
    "id": "f59216e2",
    "outputId": "f63a33a5-d980-4ef8-8030-79d342cb2f7a"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>writeMillis</th>\n",
       "      <th>nodePropertiesWritten</th>\n",
       "      <th>modularity</th>\n",
       "      <th>modularities</th>\n",
       "      <th>ranLevels</th>\n",
       "      <th>communityCount</th>\n",
       "      <th>communityDistribution</th>\n",
       "      <th>postProcessingMillis</th>\n",
       "      <th>preProcessingMillis</th>\n",
       "      <th>computeMillis</th>\n",
       "      <th>configuration</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>15</td>\n",
       "      <td>102</td>\n",
       "      <td>0.66782</td>\n",
       "      <td>[0.6678200692041523]</td>\n",
       "      <td>1</td>\n",
       "      <td>15</td>\n",
       "      <td>{'p99': 11, 'min': 2, 'max': 11, 'mean': 6.8, ...</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>641</td>\n",
       "      <td>{'maxIterations': 10, 'writeConcurrency': 4, '...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   writeMillis  nodePropertiesWritten  modularity          modularities  \\\n",
       "0           15                    102     0.66782  [0.6678200692041523]   \n",
       "\n",
       "   ranLevels  communityCount  \\\n",
       "0          1              15   \n",
       "\n",
       "                               communityDistribution  postProcessingMillis  \\\n",
       "0  {'p99': 11, 'min': 2, 'max': 11, 'mean': 6.8, ...                     3   \n",
       "\n",
       "   preProcessingMillis  computeMillis  \\\n",
       "0                    0            641   \n",
       "\n",
       "                                       configuration  \n",
       "0  {'maxIterations': 10, 'writeConcurrency': 4, '...  "
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "run_query(\"\"\"\n",
    "CALL gds.louvain.write('stock', {\n",
    "  relationshipTypes:['SIMILAR'],\n",
    "  writeProperty:'louvain'\n",
    "})\n",
    "\"\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b6e11de4",
   "metadata": {
    "id": "b6e11de4"
   },
   "source": [
    "With such small graphs, I find the best way to examine community detection results is to simply produce a network visualization. Following the research paper idea, you would want to invest in stocks from different communities to diversify your risk and increase profits. You could pick the stocks from each community using a linear regression slope to indicate their performance. I found there is a simple linear regression model available as an `apoc.math.regr` procedure. Read more about it in the documentation. Unfortunately, the developers had different data model in mind for performing linear regression, so we first have to adjust the graph model to fit the procedure input. In the first step, we add a secondary label to the stock trading days nodes that indicate the stock ticker it represents."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "ca47c1d9",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 81
    },
    "id": "ca47c1d9",
    "outputId": "ecfc9ecc-2d7b-47aa-c602-c0deca510a0e"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>'done'</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>done</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  'done'\n",
       "0   done"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "run_query(\"\"\"\n",
    "MATCH (s:Stock)-[:TRADING_DAY]->(day)\n",
    "CALL apoc.create.addLabels( day, [s.name]) YIELD node\n",
    "RETURN distinct 'done'\n",
    "\"\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ef02a835",
   "metadata": {
    "id": "ef02a835"
   },
   "source": [
    "Next, we need to calculate the x-axis index values. We will simply assign an index value of zero to each stock's first trading day and increment the index value for each subsequent trading day."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "ccef7f2c",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 49
    },
    "id": "ccef7f2c",
    "outputId": "d639b087-951f-4974-9f8c-00383a644f29"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Empty DataFrame\n",
       "Columns: []\n",
       "Index: []"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "run_query(\"\"\"\n",
    "MATCH (s:Stock)-[:TRADING_DAY]->(day)\n",
    "WHERE NOT exists { ()-[:NEXT_DAY]->(day) }\n",
    "MATCH p=(day)-[:NEXT_DAY*0..]->(next_day)\n",
    "SET next_day.index = length(p)\n",
    "\"\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "58411c92",
   "metadata": {
    "id": "58411c92"
   },
   "source": [
    "Now that our graph model fits the linear regression procedure in APOC, we can go ahead and calculate the slope value of the fitted line. In a more serious setting, we would probably want to scale the closing prices, but we will skip it for this demonstration. The slope value will be stored as a node property."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "2880d629",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 49
    },
    "id": "2880d629",
    "outputId": "332a06bc-7a46-4d71-b65e-f0707273da0d"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Empty DataFrame\n",
       "Columns: []\n",
       "Index: []"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "run_query(\"\"\"\n",
    "MATCH (s:Stock)\n",
    "CALL apoc.math.regr(s.name, 'close', 'index') YIELD slope\n",
    "SET s.slope = slope;\n",
    "\"\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "832b33cf",
   "metadata": {
    "id": "832b33cf"
   },
   "source": [
    "As a last step, we can recommend the top three performing stocks from each community."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "d3f6c3fe",
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 332
    },
    "id": "d3f6c3fe",
    "outputId": "76873bfb-dbf0-429a-f7e3-08ae550700ce"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>community</th>\n",
       "      <th>potential_investments</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>66</td>\n",
       "      <td>[AMZN, PYPL, MTCH]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>44</td>\n",
       "      <td>[GOOG, GOOGL, ISRG]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>53</td>\n",
       "      <td>[BKNG, VRSN, TXN]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>18</td>\n",
       "      <td>[MELI, ASML, REGN]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>22</td>\n",
       "      <td>[CHTR, COST, VRSK]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>46</td>\n",
       "      <td>[IDXX, ADBE, INTU]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>58</td>\n",
       "      <td>[LRCX, AMGN, MAR]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>74</td>\n",
       "      <td>[ORLY, AVGO, CTAS]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>72</td>\n",
       "      <td>[NFLX, OKTA, WDAY]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>7</td>\n",
       "      <td>[ZM, BIIB, ADI]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>14</td>\n",
       "      <td>[KLAC, NXPI, SGEN]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>42</td>\n",
       "      <td>[HON, PEP, CHKP]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>77</td>\n",
       "      <td>[ADP, SBUX]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>39</td>\n",
       "      <td>[VRTX, ROST, FISV]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>48</td>\n",
       "      <td>[BIDU, PDD, NTES]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    community potential_investments\n",
       "0          66    [AMZN, PYPL, MTCH]\n",
       "1          44   [GOOG, GOOGL, ISRG]\n",
       "2          53     [BKNG, VRSN, TXN]\n",
       "3          18    [MELI, ASML, REGN]\n",
       "4          22    [CHTR, COST, VRSK]\n",
       "5          46    [IDXX, ADBE, INTU]\n",
       "6          58     [LRCX, AMGN, MAR]\n",
       "7          74    [ORLY, AVGO, CTAS]\n",
       "8          72    [NFLX, OKTA, WDAY]\n",
       "9           7       [ZM, BIIB, ADI]\n",
       "10         14    [KLAC, NXPI, SGEN]\n",
       "11         42      [HON, PEP, CHKP]\n",
       "12         77           [ADP, SBUX]\n",
       "13         39    [VRTX, ROST, FISV]\n",
       "14         48     [BIDU, PDD, NTES]"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "run_query(\"\"\"\n",
    "MATCH (s:Stock)\n",
    "WITH s.louvain AS community, s.slope AS slope, s.name AS ticker\n",
    "ORDER BY slope DESC\n",
    "RETURN community, collect(ticker)[..3] as potential_investments\n",
    "\"\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "848c6563",
   "metadata": {
    "id": "848c6563"
   },
   "source": [
    "# Conclusion\n",
    "This is not financial advice, do your own research before investing. Even so, in this blog post, I only looked at a 90-day window for 100 stocks. If you wanted to get more serious, you would probably want to collect a more extensive dataset and fine-tune the correlation coefficient calculation. Not only that, but a simple linear regression might not be the best indicator of stock performance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "83c591d1",
   "metadata": {
    "id": "83c591d1"
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "colab": {
   "include_colab_link": true,
   "name": "Stock diversification analysis.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
